Ranking Outliers Using Symmetric Neighborhood Relationship

نویسندگان

  • Wen Jin
  • Anthony K. H. Tung
  • Jiawei Han
  • Wei Wang
چکیده

Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining local outliers, i.e., the outliers that have density distribution significantly different from their neighborhood. The estimation of density distribution at the location of an object has so far been based on the density distribution of its k-nearest neighbors [2, 11]. However, when outliers are in the location where the density distributions in the neighborhood are significantly different, for example, in the case of objects from a sparse cluster close to a denser cluster, this may result in wrong estimation. To avoid this problem, here we propose a simple but effective measure on local outliers based on a symmetric neighborhood relationship. The proposed measure considers both neighbors and reverse neighbors of an object when estimating its density distribution. As a result, outliers so discovered are more meaningful. To compute such local outliers efficiently, several mining algorithms are developed that detects top-n outliers based on our definition. A comprehensive performance evaluation and analysis shows that our methods are not only efficient in the computation but also more effective in ranking outliers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzy Robust Regression Analysis with Fuzzy Response Variable and Fuzzy Parameters Based on the Ranking of Fuzzy Sets

‎Robust regression is an appropriate alternative for ordinal regression when outliers exist in a given data set‎. ‎If we have fuzzy observations‎, ‎using ordinal regression methods can't model them; In this case‎, ‎using fuzzy regression is a good method‎. ‎When observations are fuzzy and there are outliers in the data sets‎, ‎using robust fuzzy regression methods are appropriate alternatives‎....

متن کامل

Algorithms for Spatial Outlier Detection

A spatial outlier is a spatially referenced object whose non-spatial attribute values are significantly different from the values of its neighborhood. Identification of spatial outliers can lead to the discovery of unexpected, interesting, and useful spatial patterns for further analysis. One drawback of existing methods is that normal objects tend to be falsely detected as spatial outliers whe...

متن کامل

Local Learning for Mining Outlier Subgraphs from Network Datasets

In the real world, various systems can be modeled using entity-relationship graphs. Given such a graph, one may be interested in identifying suspicious or anomalous subgraphs. Specifically, a user may want to identify suspicious subgraphs matching a query template. A subgraph can be defined as anomalous based on the connectivity structure within itself as well as with its neighborhood. For exam...

متن کامل

Robust Outlier Detection Using Commute Time and Eigenspace Embedding

We present a method to find outliers using ‘commute distance’ computed from a random walk on graph. Unlike Euclidean distance, commute distance between two nodes captures both the distance between them and their local neighborhood densities. Indeed commute distance is the Euclidean distance in the space spanned by eigenvectors of the graph Laplacian matrix. We show by analysis and experiments t...

متن کامل

A Modified Weighted Symmetric Estimator for a Gaussian First-Order Autoregressive Model with Additive Outliers

Guttman and Tiao [1], and Chang [2] showed that the effect of outliers may cause serious bias in estimating autocorrelations, partial correlations, and autoregressive moving average parameters (cited in Chang et al. [3]). This paper presents a modified weighted symmetric estimator for a Gaussian first-order autoregressive AR(1) model with additive outliers. We apply the recursive median adjustm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006